Click Here!
home account info subscribe login search My ITKnowledge FAQ/help site map contact us


 
Brief Full
 Advanced
      Search
 Search Tips
To access the contents, click the chapter and section titles.

Oracle Performance Tuning and Optimization
(Publisher: Macmillan Computer Publishing)
Author(s): Edward Whalen
ISBN: 067230886x
Publication Date: 04/01/96

Bookmark It

Search this book:
 
Previous Table of Contents Next


Physical Data Layout

This section looks at how the data in a data warehouse should be configured. First, it looks at how to lay out the data on traditional disks; then it looks at disk arrays. I recommend using disk arrays if at all possible; the ease of use and performance benefits are worth the cost of the array.

The main goals in designing the physical data layout are to balance the I/O across all the disks that are randomly accessed and to isolate the sequential I/O. The data warehousing system typically involves loading and processing of data, which causes moderate to significant use of the redo logs. By isolating the redo log files to their own disk volumes, you can take advantage of the sequential nature of their I/Os.

In a data warehouse, the majority (if not all) of the data files are accessed in a random fashion but can take advantage of multiblock reads. To take advantage of multiblock reads, stripe the data over as many disks as necessary to achieve I/O rates your disk drives can handle.

In other chapters of this book, I have recommended that you use smaller disks rather than larger ones to maximize disk count. I do not recommend that here. Because of the enormous amounts of data being stored in the data warehouse, it is probably most economical to purchase fairly large disks. Don’t go overboard and buy only a few enormous disks, but don’t purchase the smallest ones either.

Traditional Disks

The layout for a data warehouse can be large and difficult to manage. A minimal configuration should look something like this:


Element (# of Disks) Comments

System (1+) The system disk is used for the operating system, swap file (if applicable), user files, and Oracle binaries.
Redo log (2+) Because there is moderate to heavy redo log activity, it is best to have at least two disk drives so that you can mirror the logs. When you factor in archiving, you may be better off with at least four disks.
Archive logs (1+) You can take advantage of the sequential nature of the archive process by isolating the archive log files to their own set of disks.
Data files The number of disks you need for data is determined by the amount of random I/O your user community generates and the size of the database. In this type of environment, the number of disks can be significant.
Index files The number of disks needed for indexes is determined by the size of the indexes and the number of I/Os to the indexes. In this type of environment, the number of index disk drives can also be significant.

Both the data files and the indexes should be striped over as many disk drives as necessary to achieve optimal I/O rates on those disks. From Chapter 14, “Advanced Disk I/O Concepts,” remember that you can only push a disk drive to a maximum random I/O rate.

As you have seen in previous chapters, the data and indexes can be striped across the disks using Oracle or RAID striping or a combination of the two. With large data warehousing systems, I recommended OS or hardware striping. To take advantage of the Oracle Parallel Query option, you will benefit from having several large extents. An optimal configuration may consist of several data files residing on the same large, striped volume. If you do not use Oracle striping and build one large extent, you may not see the full benefits of the Parallel Query option.

I prefer a hardware disk array to manual Oracle striping primarily because the disk array provides excellent performance and is easy to use. When you use a disk array, the task of distributing I/Os can be greatly simplified.

Disk Arrays

The layout for the data warehouse on RAID volumes is much simpler than it is on traditional disk drives. A minimal configuration should look something like this:


Element (# of Volumes) Comments

System (1+) The system disk is used for the operating system, swap file (if applicable), user files, and Oracle binaries. If possible, you should mirror this disk for additional fault tolerance.
Redo log (2+) Because there is moderate to heavy redo log activity, it is best to have at least two disk drives so that you can mirror the logs. This volume should be made up of at least two physical disks using RAID-1. By using only one log volume, performance degrades during archiving because the sequential nature of the log writes is disrupted.
Archive logs (1+) The number of disks needed for the archive log files is determined by the amount of data you need to archive. This data can be written to tape as necessary. You can take advantage of the sequential nature of the archive process by isolating the archive log files to their own set of disks.
Data and index (1+) Because you always have concurrent access to disks in a disk array, it is not necessary to split the data and indexes into separate volumes. The number of disks you need for data and index is determined by the amount of random I/O your user community generates and the size of the database. This logical volume may consist of many disk drives. Performance is enhanced by using as few volumes as possible and striping over as many drives as possible.

Both the data files and the indexes should be striped over as many disk drives as necessary to achieve optimal I/O rates on those disks. From Chapter 14, “Advanced Disk I/O Concepts,” remember that you can only push a disk drive to a maximum random I/O rate.


Previous Table of Contents Next


Products |  Contact Us |  About Us |  Privacy  |  Ad Info  |  Home

Use of this site is subject to certain Terms & Conditions, Copyright © 1996-2000 EarthWeb Inc.
All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited.